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1.   Introduction 

The  general  problem  considered  here  is  that  of  determining 
or  reconstructing  a  completely  ordered  linear  string  (sequence)  of 
terminals,  given  the  results  of  some  fragmentation  of  the  string. 
The  degree  of  order  assumed  in  the  fragmented  string,  i.e.  the 
properties  of  the  input  data,  are  crucial  both  in  defining  the  class 
of  problems  which  can  be  solved  by  a  given  method  and  in  relating  the 
abstract  problem  to  physical  applications.   Here,  we  will  consider 
a  class  of  problems  for  the  idealized  case  of  unambiguous  fragment 
identification.   An  algorithm  and  program  will  be  given  to  solve  this 
problem. 

The  application  of  sequence  reconstruction  which  motivated 
our  interest  in  the  problem  is  the  area  of  monomer  sequencing  in  a 
polymer.  The  goal  here  is  to  determine  the  linear  ordering  of  the 
(monomer)  subunits  in  natural  polymers,  which  are  chains  of  monomers 
configured  into  complex  three-dimensional  structures.  Two  examples 
which  have  received  the  most  attention  are  amino  acid  sequencing  in 
proteins  and  nucleotide  sequencing  in  deoxyribonucleic  acid  (DNA)  and 
ribonucleic  acid  (RNA). 

For  an  introduction  to  polymers  see  Natural  High  Polymers 
[l].   Reference  [2]  attempts  to  collect  all  extant  material  on  pro- 
tein and  nucleic  acid  sequences,  and  will  be  published  annually. 
References  [3]  and  [k]    contain  specific  algorithms  for  special  cases 
of  sequence  reconstruction  and  will  be  discussed  later. 

Another  possible  area  for  the  application  of  sequence  re- 
construction is  coding  theory  or  cryptography.   Many  copies  of  a 
desired  message  could  be  fragmented  and  then  transmitted.   In  this 
case,  errors  in  the  communication  channel  could  produce  the  ambiguous 
fragment  identification  class  of  problems. 
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2.  Problem  Definition 

Let  S  be  a  string  over  some  finite  alphabet,  V.  That  is,  S  is 
an  ordered  string  of  characters  (terminals)  which  belong  to  V. 
Assume  that  S  has  been  broken  up  into  a  set  of  substrings-  and  that 
the  order  of  the  characters  within  each  substring  is  not  known.   An 
unordered  substring  will  be  called  a  fragment. 

The  general  problem  considered  here  is  to  determine  the 
original  sequence  of  characters  in  S  given  some  complete  sets  of 
fragments  from  S . 

Definitions  and  Notation 

The  lowest  level  elements,  I.e.  those  which  are  never  sub- 
divided  are  oalled  terminals.  Terminals  will  he  represented  hy  upper 
case  alphanumeric  characters. 

A  sequence  is  an  ordered  set  of  terminals  with  repetitions 
allowed.  A  sequence  will  he  represented  as  a  string  of  terminals 
separated  hy  dashs .   Example:  T-H-I-S-I-S-A-S-E-Q-U-E-N-C-E 

A  fragment  is  an  unordered  set  of  terminals  with  repetitions 
allowed.  A  frasment  will  be  represented  as  a  string  of  concatenated 
characters.   Example:   THISISAFRAGMENT 

A  chain  is  an  ordered  set  of  fragments.   A  chain  will  he 
represented  as  a  string  of  fragments  separated  hy  dashes.   Example: 

THIS-ISA-CH-AIN . 

Note:   If  each  fragment  in  a  chain  contains  only  one  terminal  the 

chain  is  a  sequence. 

taIi=a  collection  of  chains  either  unordered  or  partially 
ordered.   An  unordered  copy  will  be  represented  as  a  string  of  chains 
separated  by  asterisks.   Examples:   TH-IS*IS«A«C-0-PV 
ASTER-ISKS*ME-AN*NO-ORDER . 

A  partially  ordered  copy  in  which  the  order  of  some  chains  is  fixed 
will  he  represented  by  delimiting  the  chains  with  fixed  positions  by 
dashes.   Example:   THE«CH-AIN*-CHAI-N1-»IS»FIXED 
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Any  copy  represents  a  finite  number  of  distinct  sequences. 
The  copy  will  be  called  a  valid  copy  for  each  of  the  sequences 
which  it  represents. 

With  this  terminology,  summarized  in  Table  1,  we  can  now 
restate  the  sequence  determination  problem.   Given  a  sequence  S  we 
obtain  a  number  of  copies  which  are  assumed  to  be  valid  copies  for  S. 
The  problem  is  to  determine  the  original  sequence  S  from  the  set 
of  copies.   It  is  apparent  that  the  set  of  all  possible  sequences 
for  which  the  set  of  copies  is  valid  is,  in  fact,  just  the  inter- 
section of  the  sets  of  sequences  for  which  each  individual  copy  is 
valid.   If  this  intersection  contains  more  than  one  element,  the 
original  sequence  cannot  be  uniquely  determined  from  the  input  copies. 
The  program  SEQ1  determines  the  set  of  possible  sequences  given  a 
set  of  input  copies  without  fixed  chains.   The  result  is  a  set  of 
copies  which  represent- all  sequences  consistent  with  the  input. 
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Table  1.  Syntax  For  Sequence  Notation 


<terminal> 

<sequence> 

<fragment> 

<chain> 

<fixed  chain> 

<copy> 


;=  < alphanumeric^ 

:=  <terminal>   [-  <terminal>] . .  ■ 

:=  <terminal>   .•• 

:=  <fragment>  [-  <fragment>  ]  . 

:=  -  <chain>- 

:=     {<chain>|<fixed  chain>> 

[*{<chain>|<fixed  chain>}].. 
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3.   Strategy  Used 

To  help  visualize  the  sequence  determination  problem 
under  consideration,  we  can  picture  the  n  input  copies  as  being 
lined  up  vertically,  one  to  a  row,  on  a  rectangular  board.   Each 
fragment  of  a  copy  can  be  thought  of  as  a  piece  which  may  be  moved 
anywhere  within  its  row.   Any  ordering  of  the  terminals  within  a 
fragment  is  also  possible.   The  goal  is  to  position  the  fragments 
and  reorder  terminals  within  fragments  such  that  all  the  rows  spell 
out  the  same  sequence.   All  possible  sequences  must  be  found. 

The  strategy  used  to  solve  this  problem  was  chosen  with  the 
following  criterion  in  mind.   As  we  increase  the  number  of  input  copies, 
and  consequently  the  information  available  to  help  solve  the  problem, 
the  algorithm  should  not  be  degraded  in  any  way.   Because  of  this 
criterion,  strategies  which  attempt  to  compare  all  copies  simultaneously 
to  obtain  a  sequence  were  discarded.   These  strategies  can  be  degraded 
because  of  additional  storage  requirements  as  more  input  copies  are 
used.   An  example  of  this  type  of  strategy  is  one  in  which  pieces  in 
a  vertical  column  in  all  copies  are  matched  up  before  all  pieces  in 
any  one  copy  are  ordered. 

To  use  the  symmetry  of  the  problem  to  eliminate  choices, 
the  strategy  starts  from  the  left  end  of  a  copy  and  attempts  to 
place  pieces  sequentially  towards  the  right.   If  one  sequence  or 
copy  is  retained  at  any  stage,  then  its  reverse  should  also  be  re- 
tained.  Comparisons  are  made  to  eliminate  saving  both  a  copy  and  its 
reverse  during  processing. 

The  final  strategy  adopted  compares  copies  in  pairs  only. 
The  basic  matching  algorithm  compares  two  copies,  copy  I  and  copy  J, 
and  determines  a  set  of  copies  which  represent  the  intersection  of 
the  sequences  represented  by  the  two  copies. 

At  any  stage  in  the  process  there  exists  a  set  of  current 
candidate  copies  obtained  from  processing  previous  input  copies,  and 
a  set  of  new  candidate  copies  being  formed  by  comparing  a  new  input 
copy  with  the  current  candidate  copies. 
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The  new  input  copy,  J,  is  compared  with  each  copy,  I,  in  the  set  of 
current  candidate  copies.   The  set  of  consistent  copies  from  each 
I,  J  comparison  is  added  to  the  set  of  new  candidate  copies.   When 
1  has  varied  over  all  current  candidate  copies,  the  current  candidate 
copies  and  the  input  copy  J  can  be  deleted  from  consideration.  The 
new  candidate  copies  then  become  the  current  candidate  copies,  the 
next  input  copy  becomes  copy  J,  and  the  process  continues.   After 
each  pairwise  comparison  the  number  of  sequences  represented  cannot 
increase.   The  candidate  copies  which  remain  after  all  input  copies 
have  been  used  represent  all  sequences  consistent  with  the  input. 

In  the  program  SEQ1,  the  first  I  and  J  are  input  copies  1 
and  2,  and  then  J  is  incremented  sequentially.   However,  the  choice 
of  I  and  J,  i.e.  the  ordering  of  the  copies,  can  affect  the  algorithm 
drastically.   Thus,  a  better  strategy  than  that  used  in  SEQ1  should 
attempt  to  order  the  copies  to  improve  efficiency.   For  example,  we 
would  like  to  pick  an  I  and  J  which  will  generate  a  small  set  of 
new  candidate  copies.   One  possibility  is  to  define  a  measure  re- 
presenting the  internal  ordering  of  a  copy,  and  to  use  this  measure 
in  a  dynamic  ordering  procedure  of  the  type  used  in  sequential 
pattern  recognition  [5].   The  general  strategy  is  outlined  below. 
With  each  pairwise  comparison  the  number  of  sequences  represented  is 
nonincreasing. 

1.  Choose  an  I  and  J. 

2.  Compare  copy  I  with  copy  J  to  find  the  set  of 
copies  consistent  with  this  pair. 

3.  Replace  copy  I  and  J  by  the  new  set  of  copies. 
k.      Go  to  1  unless  stopping  criterion  is  satisfied. 


-6- 


Pairvise  Copy  Comparison 

The  comparison  between  two  copies,  copy  1  and  copy  2,  is 
done  exhaustively  by  trying  all  possibilities  of  chains  from  each 
copy  at  the  current  position  of  the  right  end  of  the  partially 
ordered  copy.   At  the  start,  any  chain  is  chosen  to  begin  the  ordering 
of  chains  in  copy  1  and  all  chains  from  copy  2  are  tried  to  match 
the  order  of  the  chain  in  copy  1.   Moving  from  left  to  right  at  any 
point  in  the  process,  there  are  some  initial  matching  terminals, 
some  terminals  from  the  overlapping  part  of  one  chain  in  one  copy ,  and 
the  chains  in  each  copy  which  have  not  been  used.   A  "sewing  process" 
is  carried  out  for  the  matching  of  chains.   A  chain  is  tried  in  the 
underlapping  copy  to  match  the  overlap  in  the  overlapping  copy.   If 
a  match  is  found,  the  chain  is  added  to  its  copy,  the  new  overlap  is 
calculated  and  added  as  a  fragment  to  the  new  candidate  copy, and  the 
process  continues.   There  are  6  cases  of  successful  matches  which 
are  illustrated  in  figure  1. 

At  the  beginning  of  the  process  and  whenever  a  chain  is 
complete,  i.e.  when  there  is  no  overlap,  a  new  chain  is  started 
by  picking  an  unused  chain  in  copy  1  as  the  initial  overlap. 

A  stack  is  used  to  save  information  on  each  chain  being 
used.   The  chain  information  is  pushed  into  the  stack  in  the  same 
order  as  the  chains  are  selected  in  the  sewing  process.   Each  stack 
position  corresponds  to  a  node  on  the  tree  of  all  possible  valid 
chain  choices.   Backup  to  a  previous  node  is  done  by  popping  the  stack 
whenever  a  match  fails.   See  section  k   for  an  example  of  the  operation 
of  the  stack. 
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State 


Candidate  Chain: 
Input  Fragment: 


F-T-EE-NT 


Addition 


F-T-EE-NT 


FTE 


Next  State 


E-NT 


IF 


IF 


FI 


T-N 


T-N 


CENT 


CE 


TS-N-EE  I 


S-N-EE 


CEN 


CEN 


CEN 


FIFTE 


IF-F 


FIFTE 


TE 


Figure  1.   Fragment  Sewing  Process 
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h .   Program  Description 

A  computer  program,  SEQ1,  was  written  in  SN0B0LU  to 
solve  the  sequence  determination  problem  according  to  the  exhaustive 
search  strategy  described  above.   The  program  flowchart  and  program 
listing  are  given  in  Appendix  I. 

Input 

One  program  run  is  the  processing  done  on  a  set  of  input 
copies  assumed  to  be  fragmented  from  the  same  sequence.   The  program 
will  execute  any  number  of  runs  on  different  input  data.   The  first 
data  card  contains  an  integer,   NRUNS,  specifying  the  number  of  runs. 
Then  for  each  run  the  following  cards  are  needed.   The  first  card 
contains  two  integers  separated  by  blanks  specifying  the  number  of 
input  copies  for  this  run,  NC0PS,  and  the  maximum  number  of  fragments 
in  any  copy,  MAXFRG.   The  remaining  cards  contain  the  input  copies 
for  the  run.   Each  card  contains  one  copy  except  that  a  copy  may  be 
continued  on  additional  cards  by  starting  the  continue  cards  with  a 
period  in  column  1.   A  copy  is  composed  of  fragments  separated  by 
blanks .   Table  2  shows  the  input  data  format . 

Output 

The  input  copies  are  always  printed  out  first.   Then  each 
candidate  copy  and  its  index  in  the  chain  array  is  printed  out  for 
each  J.   Candidate  copies  which  are  not  saved  because  they  are  equal 
to  another  copy  already  stored,  possibly  with  some  chains  reversed, 
are  also  printed  out.   The  candidate  copies  for  the  last  J  represent 
the  final  set  of  sequences  consistent  with  the  input. 

Three  variables,  N0TPRT1 ,  N0TPRT2 ,  and  N0TSTKP  are  flags 
which  control  the  amount  of  optional  information  printed  out  for  each 
run.   This  information  is  primarily  useful  for  debugging.   N0TPRT1  =  0 
specifies  that  every  time  a  chain  match  is  obtained  in  the  pairwise 
copy  comparison  routine,  the  chain  which  matched,  the  overlap  0VLAP, 
and  the  index  of  the  next  copy  where  a  matching  chain  will  be  sought, 
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Data  Format  For  a  Copy: 


1st  card        <fragment>  <blanks>  <fragment> 
continue  cards   .<fragment>  <blanks>  <fragment> 


.  <blanks><fragment> 
.  <blanks><  fragments 


Input  Data  Cards 

Card  No.      Data  on  Card  Beginning  in  Column  1. 

1  NRUNS 

2  NC0PS  MAXFRG 

3  <copy  1> 
<copy  2> 


:copy  NC0PS> 


Format  repeated  for  each  run 
run   l 


run   2 


run 


NRUNS 


Table  2.   Input  Format 
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NC,  are  printed  out.   N0TPRT2  =  0  specifies  that  the  complete  stack 
contents  will  be  printed  out  in  the  backup  program  each  time  the 
backup  is  within  the  same  chain.   N0TSTKP  =  0  specifies  that  the 
complete  stack  contents  will  be  printed  out  together  with  the  value 
of  I  whenever  a  new  candidate  copy  is  found. 
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Comparison  Routine 

The  pairwise  copy  comparison  routine  is  the  part  of  the 
program  which  attempts  to  find- an  unused  chain  in  copy  C  to  match 
the  overlap  (0VLAP).   There  are  6  cases  for  the  output  of  this  routine 
when  a  successful  match  is  made.   TFRG  and  TCHN  are  strings  which 
hold  the  fragment  and  chain  being  added  to  the  right  hand  end  of 
copy  2  and  copy  1  respectively.   C=l  means  TFRG  is  set  equal  to  the 
0VLAP  in  copy  2  and  an  unused  chain  is  searched  for  in  copy  1  and  set 
equal  to  TCHN.   C=2  means  TCHN  =  0VLAP  in  copy  1  and  TFRG  =  the 
first  unused  chain  in  copy  2.   Table  3  shows  the  values  of  variables 
for  each  of  the  6  cases  when  a  match  is  made  between  TFRG  and  TCHN. 
A  new  chain  is  being  constructed  in  CHAINS  <CK>.   In  general,  when 
a  match  is  found  the  order  of  the  last  fragment  may  be  improved  and 
a  new  overlap  may  be  added  to  the  chain.   Slashes,  which  are  eventually 
replaced  by  dashes ,  are  used  to  delimit  the  overlap  sections  of  a  chain 
and  thus  show  the  new  order  after  a  match. 

If  no  match  is  found  a  new  TFRG  or  TCHN  is  sought.   If  none 
are  left  which  are  unused  and  not  tried  previously  in  this  position 
the  backup  part  of  the  program  is  entered. 

If  the  matching  fragments  line  up  exactly,  cases  2  and  5, 
a  chain  within  the  copy  is  complete  and  either  another  chain  can  be 
started,  at  NEWCHN  in  the  program,  or  a  new  candidate  copy  is  complete. 

In  the  current  program  a  new  candidate  copy  will  be  saved 
if  no  copy  has  already  been  saved  which  can  be  obtained  by  reversing 
some  chains  of  the  new  candlate.   Let  the  reverse  of  a  chain,  a, 
be  indicated  by  a.   For  example  a*  (3*6*y  will  not  be  saved  if 

A  A  A  A 

{a|a}  * {B|$}  * {6|6}  *  {y ] y )  has  already  been  saved. 
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Data  Arrays 

A  stack  of  two  fields  is  used  to  store  the  indices  of  the 
chains  which  are  being  used  in  attempting  to  construct  a  candidate 
copy  from  C0PY<1,  >  and  C0PY<2,'  >.  The  stack  is  needed  so  that 
when  an  attempt  fails  we  can  backup  and  try  new  chains  until  all 
possibilities  have  been  tried.   The  first  f i eld, STK<M,1>, holds- the 
index  of  the  copy,  either  1  or  2.   The  second  field,  STK<M,2>, holds 
the  index  or  the  negative  of  the  index  of  the  chain  within  the  copy. 
A  negative  index  means  the  reverse  of  the  chain  is  being  used.   Thus, 
the  chain  indicated  by  one  stack  location  is  either 
C0PY  <STK  <M,1>,  STK  <M,2>>  or  C0PY  <STK  <M,1>,   -STK  <M,2» 
If  the  reverse  of  a  chain  needs  to  be  tried  it  will  always  be  tried 
first. 

The  operation  of  the  stack  is  illustrated  in  the  output  of 
the  example  in  figure  3.   Reading  from  left  to  right,  from  bottom  to 
top  of  the  stack,  the  chains  used  in  each  copy  are  indicated,  separated 
by  commas.   When  a  candidate  is  found  all  chains  have  been  used.   The 
pushing  and  popping  of  chains  into  the  stack  is  shown  between  discovery 
of  new  candidates.   Note  for  example  that  from  the  stack  output  after 
the  candidate  in  CHAIN<6,>,  and  since  1=3  and  J=3,  we  see  that  the 
candidate  resulted  from  comparing  CHAIN<3,>  with  input  copy  3  and  that 
the  order  of  chains  which  matched  is  as  shown  below. 


I 
J 


1,1               1,2 

2,1      2,2      2,3      2,1+ 

SM-A-LL-T 

EST 

S     MAL     LTE   |  ST 

Also  note  that  a  negative  index  appears  in  the  stack  when  the  next 
candidate  is  found.   Since  1=3  the  index,  1-1,  shows  that  the  reverse 
of  the  chain  SM-A-LL-T  was  tried. 

Table  h   shows  other  arrays  used  by  the  program.   Refer  to 
Table  5  for  a  description  of  the  variables  used. 
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input  copies 


ITREE<1,1>,    ... 


,    ITREE<1,NFRG<1» 


_ITREE<NC0PS,1>,  ...    ,  ITREE<NCOPS,  NFEG  <NCOPS» 


andidate   copies 
for  last  J 


new 
candidate   copies 


CHAIN<LB,1>, 

CHAIN<UB,1>, 
CHAIN<UB+1,1>,    .. 


,    CHAIN<LB,   NCHN<LB>> 

,    CHAIN<UB,    NCHN<UB» 

,    CHAIN<UB+1,   NCHN<UB+1» 


CHAIN<UB+NEWCAND,1>,    ...         ,    CHAIN<UB+NEWCAND ,   NCHN<UB+NEWCAND>> 
J      current   input    copy 
I      current   candidate   copy- 
copy   I  {C0PY<1,1>,      ...  ,    C0PY<1,   NENT<1»} 
copy  J                    {C0PY<2,1>,      ...                ,    C0PY<2,   NENT<2»} 
copy  I  used  flags:    {USEDCHN<1,1>,      ...         ,  USEDCHN<1,   NENT<1»} 
copy  J  used   flags:    {USEDCM<2  ,1>,       ...         ,   USEDCHN<2,   NENT<2»} 
complete   chains    in  partial  new   candidate: 

{CHAINS<1>,      ...  ,      CHAINS<CK>} 


Table  k.      Data  Arrays   for   Storing  Copies   and  Chai 


ns 
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now 
on 


Storage  of  Copies 

Most  of  the  storage  used  is  in  the  CHAIN  array  which  holds 
all  the  candidate  copies.  The. size  of  the  array  is  MAXCAN  which  is 
set  in  each  run  to  NC0PS  *  '(MAXFRG  +  l).   This  may  be  changed  in 
e  assignment  statement.   If  MAXCAN  is  exceeded,  the  program  exits 
to  CHN0V,  prints  a  message  and  ends.   At  any  point  in  the  program, 
however,  more  storage  is  available  by  using  the  locations  CHAIN<1,> 
to  CHAIN   <I-1,  >  which  hold  old  candidate  copies  that  can  be  dis- 
carded. That  is,  the  CHAIN  array  can  be  thought  of  as  circular 
storage  which  need  never  be  larger  than  the  maximum  over  J  of  the 
number  of  old  candidates  which  are  unused  plus  the  number  of  new 
candidates  for  an  input  copy  J. 
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Table  5.   Variables  for  Program  SEQ1 

The  type  abbreviations  used  are  I,  S,  P,  and  R  which  represent 
Integer,  Str'ng,  Pattern,  and  Real  respectively. 


Name 


Type 


Description 


c 

I 

CH 

S 

CHAIN 

ARRAY  ( MAXCAN v  ,' MAXFRG ) 

CHAINS 

ARRAY (MAXFRG) 

CK 

I 

C0PY 

ARRAY (2 ' ,'MAXFRG  ) 

DIFF 

S 

FRGPAT 

P 

FRGS 

S 

I 

I 

IT 

I 

ITREE 

ARRAY ( NCOPS ' , • MAXFRG ) 

J 

I 

K 

I 

LB 

I 

M 

I 

MAXCAN 

I 

MAXFRG 

I 

NC 

I 

NCHN 

ARRAY (MAXCAN) 

NC0PS 

I 

NENT 

ARRAY ( 2 ) 

NEWCAND 

I 

NFRG 

ARRAY (NC0PS) 

N0TPRT1 

I 

N0TPRT2 

I 

N0TSTKP 

I 

Index  of  copy  we  are  trying  to  add  chain  to 

Temp,  location  for  one  character 

Holds  all  candidate  copies 

Holds  chains  of  partially  built  new  candidate 

No.  of  chains  in  CHAINS 

Holds  copies  I  and  J  which  are  being  compared 

Temp,  for  characters 

Pattern  to  get  one  input  fragment 

Temp,  to  hold  input  copy 

Index  of  candidate  copy  being  compared 

Holds  index  of  current  size  of  CHAIN 

Holds  all  input  copies 

Index  of  input  copy  being  compared 

Index 

Lower  index  in  CHAIN  of  candidates  for  last  J 

Pointer  to  top  of  stack  (STK) 

Maximum  no.  of  candidate  copies  allowed  in  CHAIN 

Maximum  no.  of  fragments  in  any  copy 

Index  of  next  copy  to  try  to  add  on  to(next  C). 

Holds  no.  of  chains  in  each  copy  in  CHAIN 

No.  of  input  copies  for  present  run 

No.  of  chains  in  C0PY<l,>and  C0PY<2,> 

No.  of  new  candidates  found  for  present  J 

No.,  of  fragments  in  each  input  copy 

Flag  for  no  printout  of  chain ,  0VLAP ,  and 
NC  on  success 

Flag  for  no  stack  printout  on  backup 

Flag  for  no  stack  printout  for  new  candidate 
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Name 


Type 


Description 


NRUNS 

I 

0UT 

0VLAP 
R 

I 
S 
I 

RF 

S 

SAVE 

S 

SF 

s 

STCHN 

s 

STFRG 
STK 

s 

ARRAY(STKMAX' ,'2) 

STKMAX 

I 

TCHN 

S 

TFRG 

s 

TITLE 

s 

TRVS 

s 

TSG1 

s 

TSG3 

s 

TSG5 
TV 

s 
I 

UB 
USEDCHN 

I 

ARRAY (2?,'MAXFRG) 

No.  of  program  runs  with  different  data 
Temp,  flag 

Holds  overlap  after  a  fragment  or  chain  match 
Flag  to  indicate  that  reverse  chain  need  not 
"be  tried 

Temp . 

Holds  next  input  line  during  run 

Temp,  to  hold  overlapping  fragment  part 

after  a  match 

Temp,  to  hold  TCHN 

Temp,  to  hold  TFRG 

Stack  of  2  fields  which  holds  tree  backup  point 

Maximum  depth  of  STK 

Holds  current  chain  being  compared  from  copy  1 

Holds  current  fragment  being  compared  from  copj 

Output  variable,  format  132A1 

Temp. 

Temp. 

Temp . 

Temp . 

Temp,  index 

Upper  index  in  CHAIN  of  candidates  for  last  J 

Indicates  which  chains  in  C0PY  have  been  used 
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Examples 

Figure  2  shows  the  output  of  the  program  for  the  first 
example  whi  "  h  consists  of  seven  input  copies  representing  fragmentation 
of  a  protein  which  is  twenty-one  amino  acids  (terminals)  long. 
No  extra  debugging  output  is  printed  since  the  variables  N0TPRT1, 
N0TPRT2,  and  N0TSTKP  were  all  set  to  1.  After  six  input  copies  have 
been  processed  unambiguous  results  are  obtained,  since  the  set  of 
candidate  copies  consists  of  one  completely  ordered  sequence.   The 
complete  program  executed  15,935  SN0B0L1+  statements  for  this  run. 

Figure  3  shows  an  example  with  intermediate  stack  printout 
obtained  by  setting  N0TPRT2  and  N0TSTKP  to  0.   The  initial  string 
was  S-M-A-L-L-T-E-S-T.   The  stack  contents  are  dumped  on  each  backup 
within  the  same  chain.   The  stack  printout  following  each  candidate 
printout  shows  the  order  of  all  the  chains  in  copies  I  and  J  which 
was  used  to  obtain  the  candidate.   In  this  example  there  are  two 
copies  in  the  final  set  of  candidate  copies.   The  first  is  contained 
within  the  second,  so  that  the  two  sequences  represented  by  the  copy 
in  CHAIN  <6,  >  (and  their  reverses)  are  all  sequences  consistent  with 
the  input  copies.   This  program  executed  7,^5  SN0B0L4  statements. 

Figure  k   illustrates  the  most  complete  output  possible 
with  N0TPRT1,  N0TPRT2,  and  N0TSTKP  all  set  to  0.   The  initial  string 
which  was  fragmented  was  0-U-T-P-U-T.   The  final  set  of  consistent 
sequences  can  all  be  represented  by  the  copy  0-U-T  *  PUT  in  CHAIN<8,>  . 
This  gives  2  •  3  I  =12  total  possible  sequences  consistent  with  the 
input  copies .   This  example  illustrates  that  the  program  need  only 
print  out  the  copy  in  CHAIN<8,>  if  it  eliminated  all  the  other  candi- 
dates which  were  contained  in  it  from  consideration.   This  program 
executed  5,8l8  SN0B0IA  statements. 

Another  example  tried  was  six  randomly  fragmented  copies 
from  a  sequence  of  transfer  RNA  of  length  thirty.   The  sequence  was 
G  -G  -G  -C-G  -U-G  -U-M-G  -C-G  -C-G  -U-A-G  -U-  C-G  -G  -U-A-G  -C-G  -C-M-C-U. 
The  program  terminated  due  to  insufficient  storage  for  candidates 
after  generating  fifty-four  candidates  from  the  first  I-J  copy  pair, 
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and  after  execution  of  53,582  statements.   It  is  apparent  that  the 
exhaustive  strategy  is  too  slow  for  this  case.  One  important  factor 
here  was  the  small  size  of  the  terminal  alphabet  which  allows  many 
possible  fragment  matches. 
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COPY 

1 

IS 

COPY 

-> 

IS 

COPY 

1 

IS 

COPY 

4 

IS 

COPY 

5 

IS 

COPY 

6 

IS 

COPY 

7 

IS 

GIVf?    *    QCC    *     ASVCS    *    LOQ    *    LEN    *    CCN 

GIVEQ    *    CCJLSV    *    rsi  no    ♦    iFNfi    *    rK| 

GI    *    VFOCCA    *    SVCSLO    *    QLEN    *    OCN 
GIVE    *    QCCAS    *    VCSLO    *    CLENO    *    CN 
G    *     IVEQ    *    CCASVC    *    SLOQL    *    ENOC     *    N 
GIV    *    EQC    *    CASVCSL    *    OQLE    *    NOCN 
GIVE    *    OCC    *    ASVCSLO    *    QLENOCN 


GIVE-Q-CC-AVS-CS-LOQ    *     LEN-O-CN    IS    NEW    CANDIDATE    COPY     IN  CHAIN<2,     > 

NEW    CAND.    NOT    SAVED     IS          G I VE-Q-CC-AVS-C S-LOQ    *    CN-Q-LEN  MATCHES    #2 

NEW    CAND.    NOT    SAVED    IS          L OQ-CS-AVS-CC-Q-G IVE    *    LEN-O-CN  MATCHES    #2 

NEW    CAND.    NUT    SAVED     IS         LOQ-C S-AVS-CC-Q-G I VE    *    CN-O-LEN  MATCHES    #2 

LFN-C-CN    *    GIVE-Q-CC-AVS-CS-LOQ     IS    NEW    CANDIDATE    COPY    IN  CHAINO,     > 

NEW    CAND.     NCT    SAVED    IS          LEN-O-CN    »    I  HC-PS-  Av/S-rr-Q-G  I V  F.  MATCHFS    #3 

NFW    CAND.    NOT    SAVED    IS          CN-O-LEN    *    GIVE-Q-CC-AVS-CS-LOQ  MATCHES    #3 

NFw    CAND.     NCT    SAVED    IS         CN-O-LEN    *    LOC-CS-AVS-CC-Q-GIV E  MATCHES    #3 
FNO    OF    CANDIDATES    FOR    J    =    2 

GI-VF-C-CC-A-VS-CS-LC-C-LEN-O-CN    IS    NEW    CANDIDATE    COPY     IN    CHAIN<4,     > 
NCW    CAND.    NOT    SAVED    IS        -Gl^V£-Q_-il£=A-rVS-£S.-JJl-CrJ-Eii-a-C^      MATCHES    #4 
END    OF    CAND I  DATES    FOR    J    =    3 

GI-VF-Q-CC-A-S-V-CS-LO-Q-LFN-O-CN     IS    NEW    CANDIDATE    COPY    IN    CHAIN<5,    > 
END    CF    CANDIDATES    FOR    J    =    4 

G-I-VE-Q-CC-A-S-V-C-S=:La-a-L-£RrQ-£=fil_LS-Al£M_CANniDA.I£._CDEY.   IN    CHAIN<6  ,    > 
(,-I-VF-Q-r.C-A-S-V-C-S-LC-C-L-N-E-O-CN     IS    NEW    CANCIDATE    COPY     IN    CHAIN<7,     > 

tUD    OF    CANDIDATES    FOR    J    =    5 

G-I-V-fc-Q-C-C-A-S-V-C-S-L-C-C-L-E-N-O-C-N    IS    NEW    CANDIDATE    COPY    IN    CHAIN<8,     > 

FND    CF    CANDIDATES    FOR    J    =    6 

G-I-V-F-Q-C-C-A-S-V-C-S-L-O-Q-L-F-N-O^C-N    IS    NEW    CANCIDATE    COPY     IN    CHAIN<9,     > 
NO    OF    CANDIDATES    FOR    J    =    7 


Figure  2.      Protein  Sequence  Determination 


-21- 


COPY  1  I  S   SMA  *  LLT  *  EST 

COPY  2  IS   £rt  ♦. .  AN  *  TFS  *  T  . 

fOPY  3  IS   S  *  MAL  *  LTE  *  ST 

SM-A-LL-T-ES-T  IS  NEW  CANDIDATE  COPY  IN  CHAIN<2,  > 

STK  CONTENTS:  I  I  ,  2  1  t  2  2    »  1  2  *  2_3  *  1_3  ^  2_4__* 

0   , 

I  = 

STK  CONTrNTS:  11,21,22,12,23,13,24, 

UK  CONTENTS:  11,71,27,17.73.13. 

STK  CONTENTS:  I  1  ,  7    I  ,  2  2  ,  i  2  t • 2  3  , 

SM-A-LL-T  *  EST  IS  NEW  CANDIDATE  COPY  LN  CHAIN<3*  > 

STK  CONTFNTSJ  11, 21, 22,  12,24, 13,23, 

0   , 

I  ■ 

=  1 

STK  CONTENTS:  11,21,22,12,24,13,23, 

STK  CONTENTS:  11,21,22,12,24, 

STK  CONTENTS:  1  L  ,  2J.  ,.2.2_ulJ_, 



STK  CONTENTS:  11,21,22, 

STK  CONTENTS:  11,21, 

NEW  CANO.  NOT  SAVED  IS    T-LL-A-SM  *  EST   MATCHES  #3 

STK  C  INTENTS:  12,24,22,11,21,13,23, 

0   , 

I  ' 

■  1 

STK  CONTENTS:  12,24,22,11,21,13,23, 

STK  CONTCNTS:  12,24*2  2__*_  1.  1.  i.  2.  1  , 

.  

...  .  . 

STK  CONTENTS:  12,24,22,11, 

STK  C  JNTENTS:  12,24,22, 

STK  CONTENTS:  12,24, 

EST  *  SM-A-LL-T  IS  NEW  CANDIDATE  COPY  IN  CHAIN<4,  > 

STK  CONTFNTS:  13,23,11,21,22,12,24, 

0   , 

I  = 

S1K  CJNTENTS:  1  3  ,  2  3  ._,  L  1  T  7  1  .  2  2,  12,24, 

UK  CONTENTS:  13,23,11,21,22,12, 

STK  CONTENTS:  I  3  ,  2  3  ,  1  1  ,  2  1  ,  2  2  , 

STK  C  INTENTS:  13,23,11,21, 

NEW  CAN?.  NOT  SAVED  IS    EST  *  T-LL-A-SM   MATCHES  #4 

STK  CONTENTS:  13,23,12,24,22,11,21, 

0   , 

I  = 

■  1 

STK  CONTENTS:  13,23,12,  2  4  ,  2_  2_*  JL.  1  .  «_  2  L  , 



STK  CONTENTS:  13,23,  12,24,22,  11, 

STK  CONTENTS:  13,23,17,24,22, 

STK  CONTFNTS:  13,23,  12,24, 

STK  CONTENTS:  13,23, 

NEW  CAND.  NOT  SAVED  IS    T-ES-T-LL- A-SM   MATCHES  #2 

STK  CUNTENT5:  1  3  ,  2  4._,  2-  3_*-l_2.  »_  2 ._2_»...i.  1  *  2  1.  , 

0   , 

I  i 

=  1 

STK  CONTENTS:  13,24,23,12,22,11,21, 

STK  CONTENTS:  13,24,23,12,22,11, 

STK  CONTENTS:  13,24,23,12,22, 

STK  CONTENTS:  13,24,23,12, 

STK  CONTFNTS:  13,24,23, 

STK  rONTENTS:  13,24, 

FND  OF  CANDIDATFS  EO°  J  =  2 

s_m_a_l_l_t-e-S-T  IS  NFW  CANDIDATE  COPY  IN  CHAIN<5,  > 

STK  CONTENTS:  11,21,22,23,24,01,     1= 

=  2 

STK  CONTENTS:  11,21,22,23,24, 

STK  CONTENTS  :._1  l  ,  2_  1  * _2  „2-«_.2  _3_.*.  ...           

STK  CONTENTS:  11,21,22, 

STK  CONTENTS:  11,21, 

S-"-A-l -L-T-F-ST  IS  NEW  CANDIDATE  COPY  IN  CHAIN<6,  > 

STK  CONTENTS:  11,21,22,23,12,24,01, 

I  = 

3 

STK  CONTFNTS:  11,21,22,23,12,24, 

STK  CONTENTS:  LI, 21.*  22, 23,  12, 



STK  CONTENTS:  11,21,22,23, 

STK  CONTENTS:  11,21,22, 

STK  CONTENTS:  11,21, 

Figure  3.   Program  Example  with  Intermediate  Stack  Output 
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STK    CJNTFNTS: 

12,21 

r     2    3    , 

STk    CONTF-NTS  : 

12,21 

Ml  a    CAND.    NOT     SAVED    IS 

ST-E-T-L-L-A-M-S 

MAJC.HES    #6 

st  k   CONTENTS: 

12,24 

,23,1-1,22 

,21,01 

STK    C'lNTENTS: 

12,24 

,23,     1-1,22 

,21, 

STK    CONTFNTS: 

12,24 

,     2     3     ,     1     -I     ,     2     ? 

, 

STK    CJNTENTS: 

12,24, 

,23,1-1, 

siK    C  1NTFNTS: 

12,24 

,     2     3     , 

STK    CJNTENTS: 

12,24 

i 

STK    CJNTENTS: 

11*21 

,23, 

STK    CJNTCNTS: 

11,21 

NEW    CANH.    NOT    SAVED     IS 

ST-E-T-L-L-A-M-S 

MATCHFS    #6 

STK    CONTENTS: 

11,24, 

2    3,1-2,22 

,21,01 

STK    CONTENTS: 

11,24, 

,23,1-2,22 

,21, 

STK    CONTENTS: 

11,     2    4, 

2    3,1-2,22 

., — 

STK    CONTENTS: 

I     1    t     ?.    4    , 

2^,     I    -2    , 

STK    C  STENTS: 

11,24, 

,23, 

SI  k    CONTENTS: 

11,24, 

NE  •'    CAN'T.     NU1 

"    SAVED    IS 

S-M-A-L-L-T-F-ST 

MATCHES     #6 

STK    C   INTENTS: 

12,21, 

,22,23,11, 

2    4,01 

STK    CONTENTS: 

12,21, 

2    2,23,11, 

2    4    , 

STK    CONTENTS: 

12,21, 

,22,23,11, 

STK    (.  INTENTS: 

12,21, 

2     2,23, 

STk    c   INTENTS  : 

12,21, 

2    2    , 

STK    C  INTENTS: 

12,21, 

FNT     ^r    C WHDATFS    FOP     J 

=    3 

I     =    3 


I     =    4 


I    =    4 


Figure    3. 
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rnpy  i  is 
COPY  2  IS 
COPY  3  IS 


(JUT  *  PUT 
QU  *  TPU  *  T 
0  *  UTP  *  UT 


C0PY<2,1>  = 
COPY<2,2>  = 

rnPY<i,2>  = 

CCPY<2t3>    = 
OU-T-OU-T 
STK    CONTENTS: 
STK    CONTENTS: 
STK    CONTENTS:     1 
STK    CONTENTS:     I 

COPY<2,3>    =    T    IS 

Cf;PY<2,2>   =   too   I 

OU-T    *     uiJT     IS    N 
STK    C   INTENTS:     1 
STK    CONTENTS:      1 
STK    CONTENTS:     1 
STK    CONTENTS:     1 
COPY<?,3>    =    T     IS 
C0PY^?,1>     =     JU    IS 
CUPY<2,2>    =    TPU    I 
NEW    CAND,     NUT    S 
STK    CONTENTS:     1 
STK    CUNTENISJ   _1_ 
STK    CONTENTS:     1 
STK    CONTENTS:     1 
Cf>PY<?,?>    =    TPU     I 
COPY<2,l>     =    OU    IS 
COPY<2,3>    =    T    IS 
PUT     *    OU-T     IS    N 
STK    CONTENTS:     1 
STK    rr:MTEJTS:     1 
STK    CONTENTS:     I 
COPY<?,3>     =    T     IS 
r.r'PY<?,l>    =    OH     IS 
NEW    CAND.     NUT    S 


OU    IS 
TPU    I 

PUT    I 

I    IS. 

IS    NEW    CANDIDATE    COPY    IN    CHAIN<2,    > 
I 

1 


USED  HERE  OVLAP  =  T  NC  =  2 
S  USED  HERE  OVLAP  =  PU  NC  =  X 
S  USED  HERE  OVLAP  =  T  NC  =  2 
USED  HEP.E  QVLAP  = NC  =  2 


2,23 
2,23 
2  , 


JL--Q 


1,21,22, 
1,21  ,22, 
1,21,22, 
1,21,22, 

USED  HERE  _QVLAP_= NC  =  2 

S  USED  HERE  OVLAP  =   NC  =  2 

EW  CANDIDATE  COPY  IN  CHAIN<3,  > 

1,21,23,12,22,0 


1  =  1 


I  =  1 


I  2 


2  2 


1  ,  2  1,23, 
1,21,23, 

1,21,    

USED  HERE  OVLAP  =  OU  NC  =  2 

USED  HERE  OVLAP  =   NC  =  2 
S  USED  HERE  OVLAP  =   NC  =  2 
WED  IS    T-QU  *  PUT   MATCHES  #3 
1,23,21,12,22,0   , 
1.23.21.  12.  2  _2__, 


I    =    I 


STK    CONTENTS:    1 

STK    CONTENTS:     1 

STK    CONTENTS:     I 

STK    CONTENTS:     1 

C0PY<2,3>    =     T     IS 

CQPY<2,2>  =  TPU  1 

C0PY<1, 1>  =    OUT  I 

C0PY<2,1>     =    OU     IS 

NEW    CAND.     NOT    S 

STK  CONTENTS:  1 

STK  CONTENTS:  1 

STbL  rnNTFNTS:    l 
STK    CONTENTS: 
STK    CONTENTS: 


1,23,21, 

1,23, 

S  USFD  HERE  OVLAP  =   NC  =  2 

USED  HERE  OVLAP  =  T  NC  =  2 
USED  HERF  OVLAP  =   NC  =  2 

EW  CANDIDATE  COPY  UN_ _£HA-LNj£^ ± . 

2,22,11,21,23,0   ,     1=1 

2,22,11,21,23, 

2,22,11,21, 

USED  HERE  OVLAP  =  OU  NC  =  2 

USED  HERE  OVLAP  =   NC  =  2 
AVED  IS    PUT  *  T-OU   MATCHES  #4 

2  3,21,0   ,     1  =  1 


2,22,11, 

2,22,  11,23, 

2,22,  11,23, 

2,22, 

USED    HERF    OVLAP 

i   USED   HERE_QV_LAP    = 

S    USFD    HFRF    OVLAP    = 

USED    HERE    UVLAP    = 
AVED    IS  T-PU-T-OU 

2,23,22,  11 
2,23,22,  11 
2-.    2    3    .    2    2    ,     1    1 


2    I 


PU    NC    =    2 
=    I._NC  .=  _!_ 


OU    NC    =    2 
NC    =    2 

MATCHES    #2 
,21,0 
,21, 


I    =    I 


12,23,22, 
12,23, 
END    OP    CANDIDATES    FOR     J    =    2 


U-T-PU-T    NC    = 
=    U-T    NC    =    2 


CHPY<2,1>  =  n  IS  USFD  HERE  OVLAP  = 
CL'PY<2,2>  =  UTP  IS  USED  HERETO VL A P 
C0PY<2,3>    =    UT     IS    USED    HERE    OVLAP    =       NC    =     2 

0-U-T-p-U-T     IS    NEW    CANDIDATE    COPY     IN    CHAIN<5,     > 
STK    CONTENTS:     11,21,22,23,01,  1= 


Figure   k.      Program  Example  with   Complete  Output 
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STK    CONTENTS:     11,71,27,23, 
STK    CONTFNTS:    1     1    ,    2    I     ,    2    7    , 
COPY<2,3>    =    UT     IS    LLSLD    HERE    QVLA£  _=    Pli-I    NC    =    2 
COPY<?,?>    =    UTP    IS    USEO    HERE    OVLAP    =       NC    =    2 

Ci-U-T-PU-T     IS    NEW    CANDIDATE    COPY    IN    CHAIN<6,    > 
STK    CONTENTS:     11,71,73,72,01,  1=2 

STK    CONTENTS:     11,21,23,22, 
STK    CONTENTS:     11,21,23, 
STK    CONTENTS:     i     1     ,     2    L    ,  . 
r'PY<?tl>    =    J    IS     USED    HFRE    OVLAP    =    U-T    NC     =    7 
CO°Y<2,2>    =    UTP     IS    USED    HERE     OVLAP    =    P    NC    =    1 
C0PY<1,?>     =    PUT     IS    USED    HFRE    OVLAP    =    UT    NC    =     2 
CUPY<2,3>    =    UT    I S    USED    HERE    OVLAP    =       NC    =    2 

>J-T-P-or     IS    NEW    CANDIDATE    COPY    IN    CHAIN<7,     > 
STK    CONTENTS:     11.21,22.12.23,0       .  1=3 

STK    CONTENTS:     11,2     1,22,17,23, 
STK    CONTENTS:     11,21,22,12, 
STK    CONTENTS:     11,21,22, 
COPY<2,3>    =    UT    IS    USED    HERE    OVLAP    =       NC    =    2 
CnPY<7,?>    =    UTn     IS    USED    HFRE    OVLAP    =       NC     =     2 

>U-T    *    PUT    IS    NEW    CANDIDATE    COPY     IN    CHAINO,    > 
STK    CONTENTS:     11,21,23,12,22,0       ,  [=3 

STK    CONTFNTS:     11,21,23,12,22, 
STn    CONTFNTS:     1      1,71,73, 
STK    C  INTENTS:     11,21, 
CrPY<?,7>     =    UT"    IS    USEO    HFRE    OVLAP    =       NC    =    2 
C0PY<2,1>    =    0    IS    USED    HERE    OVLAP    =    U-T    NC    =    2 
r»PY<?f3>    =    OT     IS    USFO    HERE    OVLAP     =       NC    -    2 

U1JT    *      i-J-T     IS    NEW    CANDIDATE    COPY     IN    CHAIN<9,     > 
STK    r      .rr  jts:     12,72,11,2     1,23,0       ,  [    =    3 

STK     CONTFNTS:     17,77,11,21,23 
STK    CONTFNTS:     I     7,77,11      ,21, 
STn    CONTENTS:     17,22, 
C'<PY<2,3>     =    UT     IS     USED    HERE    OVLAP    =    P    NC    =     2 
C0PY<?,7>     =    UTP    IS    USED    HFRF    OVLAP    =    UT     NC    =     1 
Cf?PV<l,l>    =    Qii-T     IS    USED    HFRF    OVLAP    =    0    NC    =    2 
Cf:PY<7,l>    =      )    IS    USED    HERE    OVLAP    =       NC    =     2 

\F*'    CANO.     NOT     SAVED     IS  UT-P-T-U-0       MATCHES     «7 

STk    CONTFNTS:     12,23,22,1-1,21,0       ,  1=3 

STK    CONTFNTS:     17,73,27,1-1,71, 
STK    CONTFNTS:     17,73,22,1-1 
STK     CONTFNTS:      1      7,73,77, 
STK    CONTENTS:     1     2     ,     2     3     , 
COPY<?,?>    -     UTP     IS    USFO    H-RE     OVLAP    =       NC    =    2 
CGPY<2,1>    =    0     IS    USED    HERE    OVLAP     =    U-T    NC    =    2 
COPY<7,3>     =    UT    IS    USED    H^RF    OVLAP    =       NC    =    2 

NEW    CAND.    NOT    SAVED     IS  PUT    *    O-U-T       MATCHES    #9 

STK    CONTENTS:     11,22,17,71,23,0       ,  1=4 

STK    CONTENTS:     11,22,17,71,23, 
STK    CONTFNTS:     11,72,12,21, 
STK    CONTENTS:     11,22, 
COPY<2,3>    =    UT     IS    USED    HERE    OVLAP     =    P    NC    =    7 
C0PY<7,2>    =    DTP     IS    USED    HERE    OVLAP    =    UT    NC    =     1 
CQPY<1,2>     =    OU-T     IS    USED    HER^    OVLAP    =    0    NC    =    2 
CPPY<2,1>    =    0     IS    USED    HERE     OVLAP    =       NC    =    2 

NFw    CAN).     NOT    SAVED    IS  UT-P-T-U-0       MATCHFS    ¥7 

STK    CONTENTS:     11,23,22,1-2,21,0        ,  1=4 

STK    CUNTFNTS:      11,73,22,1-2,71, 
STK    CONTENTS:     11,23,22,1-7 
STk    CONTFNTS:     11,73,77, 


, 


, 


, 
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STK  CONTENTS:  11,23, _. 

C0PY<2,1>    =    0    IS    USED    HERE    QVLAP    =    U-T    NC    =    2 
CC?Y<2,2>    =    UTP    IS    USED    HTRP    nvi  AP    =    P    NP    =    1 


CDPY<1,1>    =    PUT    IS    USED    HERE    OVLAP    »    UT    NC    =    2 

COPY<2,3>    =    UT    IS    USED    HERE    QVLAP    *      NC .  »_.2_ 

Nt-W    CANO.    NOT    SAVED    IS         O-U-T-P-UT      MATCHES    #7 

STK    CONTENTS:     12,21,    22    ,     1    1    ,_  2_3  .*.  0    _»  I    =    4 

S1K    CONTENTS:     12,21,22,11,23, 

STK    CONTENTS;    1,2  -^-2    1.22,11, 


STK    CONTENTS:     12,21,22* 

C0PY<2,3>    =    UT    IS    USED    HERE    OVLAP    =      NC_.=    2 

COPY<?,2>    =    UTP    IS    USED    HERE    OVLAP    =      NC    =    2 

NFW    CAND.    NOT    SAVED    IS         O-U-T    *    PUT       MATCHES    #8 

STK    CONTENTS*     12,21,23,11,22,0       ,  1=4 

STK    CONTENTS:    1    2    ,    ?1.?3T11T?7P 

STK  CONTFNTS:  12,21,23, 

STK    CONTENTS:     12,21, 

END    OF    CANDIDATES    FOP     J    =    3 
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Appendix    1.      SEQ1  Program  Listing    and  Flowcharts 


PRDCtPAM    SPQ1 

eSTLIMIT    =    100000 
NOTSTKP    =    1 
N0TPRT1  =       I 

NQTPRT2  =         1 

tMAXLNGTH    =     160 
_&  01.1  MP   =-1- 


OUTPUK 'TITLE*  ,6,«  (132A1)  •)  ' 

*  INPUT    PART    -    PIT 

*  INPUT    COPIES    GO   TO    ITREE    ARRAY 

*  INPUT    CARD     1    CONTAINS    NO.    OF    RUNS    ONLY  -.„...„..     ■                                , 

NRU.NS            =    TRIM!  INPUT)  *Ft?iIS 

*  £ET    NO.    Df    CHPTFS    AH"    «""    ™  -    np    FRAGMENTS     IN    ANY    COPY 

*  FOR    THIS    RUN    FROM    NEXT    CARD  .,,nmpRB, 

TSG1    =    TRIM! INPUT)  '     *       JFLDATAERR) 

RERUN  &ANCHOR  =    I  , 

TSG1TITLEBREAM.     •)     .    NCOPS    SPANC     •»  .FlnATAFRR)_ 

BiLEAKJ J_  'i     .    MAXFRG ;FinATAFRR) 

ITREE  =    APRAYtNCCPS     '♦'    MAXFRG) 

NFRG  =    ARRAY(NCOPS) 

I  =     I 

FRrPAT    -    BREAK!'     ')     •    TSG3  SPANl'     ■  1  — -- 

*  RUILO    UP    FRAGMENTS    FROM    INPUT    AS    FRAGMENTS    SEP.     BY    BLANKS 

TSG1    =    TJUMliNPUT) L_J :F(nftTAFRR) 

NXTCOPY    FRGS  =    TSG1 

J  =  1 

CONTCP       TSG1  =    TRIM(INPUT)        '     ' 

*  OATA    CONTINUATION    CARD    HAS    .     IN    COL.    1  cfl,, 

FRGS  ^ERULISGI 4^^ 1 

L?  FRGS  FRGPAT       =  :PIL3) 

ITREE<I,J>    =    TSG3 
J  =    J    +     I  mlL£1 

L3       NFRG<I>   =  J  -  1  .cfwv-rrnPY) 

I  =  LT(I,NCOPS)  I  ♦  1  .S(NXTCOPY) 

*  NOW  ITREE^J^-OlttlAJ-MS  FRAGMFNT  ,)  OF  CHPY  I  t 

*  FOR  1=1  TC  NCOPS  ,  J  =1  TC  NFRG<I>  . 

*  ALL  INPUT  FOR  THIS  RUN  SHOULD  BE  IN  ITREE 

*  SAVE  CONTAINS  T HF  NEXT  INPUT  LINE  , 

SAVE  =  TSG1 

*  NOW    PRINT    OUT    ITREE  J 

J =_J , 

P1T1  J  =     l  i|D1T,n 

TSGl  =       ITREE<I,1>  :iPlTll) 

PIT2  TSGl  =    TSGl        '     *    •        ITREE<I,J>  .CIP1T„ 

PlT11  j  =    LT(J^NFRG<I>J     J    ♦    1  :S(P1T2) 

OUTPUT       =        '    COPY     •     I     •     IS        '       TSGl  .clolTll 

OUTPUT 
* 

*  CATA    DEFINITION    PART    -    P2T 


INITIALIZE    MAX.    NO.    OF    ALLOWED    CANDIDATE    COPIES 

MAxr.AN  =    Nir.nPS    ♦    (MAXFRG    ♦    1) 


DEFiNE    ARRAYS    FOR    CANDICATE    COPIES    AND    NO.    OF    CHAINS    IN    EACH 
CHAIN  =    ARRAYtMAXCAN    ','     MAXFRG) 

NCHN  =     ARRAY(MAXCAN) 
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PART      P4T 

INITIALIZE    FOR    NEW    J.    COPY    J    BECOMES    COPY    2 


DEFINE    ARRAYS    FOR    CURRENT   COPIES    BEING    COMPARFn  wn       nc 

ENTRIES     IN     EACH,     AND    CHAINS    USED    Fn    EACH  °F 

QIlfi^ =    ARRAYf?     '.'     Maefpp,) 

USFDCHN  =    ARRAY(2     »,'    MAXFRG)  ' —  JJ 

NENT                  =    ARRAY (2)  41 

ewiS  ""Y^r^m^  CU"RENT  CAN°*°"E  8E,NG  8U,LT- 

*  DEFINE  STACK  FOR  SAVING  TRFE  BACKUP  POINTS  *3 
STKMAX =  I?    *    MAXFRG)  t  4 

STK         =  ARRAYISTKMAX  •»•  21  44 

45 

INITIALIZATION    PART       P3T 

INPUT    COPY     I    BECOMES   CANDIDATE    COPY    1 

LB =_J 

UB  =    i  ~~ 46 

NCHN<1>         =  J|fRG<l>  47 

K  =    i  48 

CHAIN<1,K>         =    ITREE<1,K>  *9 

*  =    LT(K,NCHN<1>)    K    ♦    1  :S(P3T1) 

'foT/^,^!^^ 
J  =    I"" 


51 


52 


K  =     i  53 

USEDCHN<2,K>       =0  5* 

C0PY<2,K>    =    ITREE<JtK>  55 

K  =    LT(K,NENT<2>)    K    ♦    1  :S(P4T11  l^ 

FIRST    I     BECOMES    FIRST    CANDIDATE  ^1**111  57 

i =  lb ; 

N£FWCAND    WI=LLQCCUNT    ALL    NEW    CANDIDATES    f5r    A~FlxI5~J 5* 

59 

COPY    I    BECOMES    CQMPARISCN    CCPY    1 
PART    P5T 

NENXOJt =._NCHNil> 

K  =     i  -  -6-C_ 

US£DCHN<l,K>      =0  61 

COPY<l,K>    =    CHAIN<IfK>  62 

K                         =    LT(K,NENT<1»    K    +    1  :SIP5T1)                                            tJ 

JZlIAnALTFE?n^N?IDATES    C0NS^rENT    WITH  INPUT    COPY    J    AND                          " 

2    AND    1    RESPECTIVELY,    WILL    BE    FOUND.  

PART    P6T 

CLEAR    STACK 

M  =    1         _.. 

RESET    CHAIN    COUNTFR    FOR    CURRENT    COPY  ^ 

START    WITH    FIRST    FRAGMENT    OF    COMP.    COPY       1  *6— 

K  =     i 


67 


P7T 

TRY    TO    BUILD    A    NEW    CHAIN    IN    THE    NEW    COPY 

WCHN         u^u^fR]L-k:LLB-f-nMP-    mPY     '     '    rHrtT"   *- 
m.HN  USFDr.HNkT  1  -  K\        =     i 


USE0CHN<1,K>       =    1 

PUSH    STACK  68 

STK<M,1>       =    I 

69 
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* 
* 


STK<*.2>      !    *T(MfSTKMAX,    MVf    ~  :>(STKOVI  7! 

CLEAR .  NFW    top    OP    STACK    POSITION    C- ; 

STK<M,1>      =0 

TRY    TQ    ADD    NEXT    CHAIN   TG    COPY    2 

CVLAP    IS    OVERLAP    WHICH    MUST    MATCH   WITH    NEXT   CHAIN 
OVLAP  =    CQPY<1 tK> 

MUST   AnOJlVLAP— IIL.CHRRFNT    NFW   f.HAIM 

CHAINS<CK>        =       V        OVLAP 


NOW    ENTER    GENERAL    SEGMENT    TO    BUILD    ONTO    THE    NEW    COPY    BY 
ADDING    A   CHAIN    FROM   COPY    C    WHICH    MATCHES    WITH    OVLAP. 


* 

*  PART  P8T 


rr    c  TUP  TWnFX  Tn  T  MP  CURRENT  C. HA  I N  RF  I  "<*.  TR  I  FD  IN 

COPY SC   R^^TVe^S^hI-  REVERSE  CHAIN  His  BEEN  TRIED  OR  DOES 

*  NOT  NEED  TO  BE  TRIED  7 

MIDCHN    IT  =  0  7 

MTn  IT  I    LT(IT,NENT<C>)     IT    ♦    I  :F(BACKUP) 

**  GQ    10  BAOOIP-lE-AU^HAlNS.-iiAV£-BFFN   TRIED  ? 

EQ(USE0CHN<C,IT>,0)  IfIPBTSI  8 

TRYFWD  EQ(C,1)  :S(P8T3)  8 

FINc'If    THE    REVERSE    FRAGMENl    CHAIN    EQUALS    ^nF^ARQ 
FRAGMENT    CHAIN    OF    COPY<l,lT>.     IF    IT     DOES    WE    DO    NOT 


*  HAVE   TOTR-Y_JLHF    RFVFRSF.     SO    WF    SFT    R    =    1 

*  FIRST    REVERSE    THE    WHOLE    CHAIN,     PUT     IT    IN    TCHN. 

*  NEED    CANCHOR    =     0    HERE  f 
&ANCHOR          =     0  J 

TooGa  check'tS^et'oK'fast  if  it  is  only  I  fragment.  _      j 

TSGl       BREAiLLL^yJ- ^8T3)    ~ 

R  =     1                                                                                                                                                        I 

I4T7               TCHN  =                                                                                     ,ri  iatoi                                             I 

{„,               ISG,  BREAK..-..    .    SF-'     =                                    ;«!♦{«• 

TCHN  =     SF     '-*     TCHN                                                                                                                    | 

I4T?                TCHN  =    TSG1        •-•       TCHN                                                                                                    , 

TCHN       RJID-S-IJ.J -""*       ~ ■ ' " 

*  NOW    SEE     IF    TCHN    =    TSGl    FRAGMENT    WISE.  ,1 

CUT                     =    0  < 
TRVS                  =    TCHN 

TSGl                 =    C0PY<1,IT>  .F(I4T5) 

I4T3  TSGl  flREAKC-')     .     SF       •-•  =  .FCI4T51  , 

TRVS         RPPAXI   '-')      .     RF !-J ^ _ — 

THECK  IF  SF  =  RF  .  BOTH  ARE  FRAGMENTS.    ^^^ 

I4T4      SF  LEN(l)  .  CH   -  • S( I  ATA) F ( P 8T4) 

RF  CH 
I AT5      SF  =  TSGl 

RF         =  TRVS 

QUI ^-J 


_L4.IAJ_ 


:F(P8T4) 

I4T6      IDENT(RF)  :F(I4T3> 

EQ(0UT,1) 
*  REVERSE  IS  EQUAL*  SO  SET  R  =  1 

R  =     1 


IP8TA) 


PBT3                TCHN                  =    C0PY<1,IT>  # 

P8TA  -IE&G =    HVI  AP • 

PBTS  R  =     l 

TCHN        =  OVLAP 
TFRG       =  C0PY<2,IT> 
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PART       P9T 

TRY     Tfl    MATCH    TTHN     yrTH    TFftr,.     PITHFQ     fT    FAIIS    HP 

IT    IS    SUCCESSFUL    AND    3    CASES       ARE    INDICATED    BY     :    NC    =    1 
NC    =    2    ,    OR    QVLAP    =    NULL. 

^ANCHOR    =    Q 

DIFF 

CUT    =    D 


OUT  =    1 


109 
110 
111 
112 
113 


STCHN  -    TCHN 

STFRG  =    TFRG 

GET    THE    LEFTMOST    FRAG.     FROM    TCHN. 

TCHN       BREAKIf-M     .    SF       •-  •       =  :S<P9T2)  ll4 

SET    OUT    =1     IF    SF     IS    LAST    FRAG       IN    TCHN 

-  SF =    TCHN 

TCHN  = 


- 115 

116 

:F(P9T12)  HI 

CH    MUST    MATCH    WITH   TFRG 

TFRG      CH       =  :F(P9T11)  119 

FAILURE     EXIT     I S     PQT7 

CHECK  FCR  EMPTY  TFRG 

IDENT(TFRG)  :F(P9T2) 

TCHN    IS    LONGER    OR    EQUAL    AND    MATCH    SUCEEDED 

SF  =       DIFF       SF 

IDENT(SF)  :F(P9T22I 

OVLAP _.=  ^LCHfl ;  (P9T71  ) 

IDENT(TCHN)  :F(P9T23) 

CVLAP  =    SF  : (P9T21) 

OVLAP  =    SF     •-•     TCHN 

NC  =    2  :(CEX2) 

DIFF  =       DIFF       CH  MP9T2I 

LOJEMTIDlFf.i tF(P9T7)  ii9 

FC(OUT,l>  :F(P9T1)       "    '  130 

TFRG    IS    LONGER    AND    MATCH    SUCEEDED. 
OVLAP  =    TFRG 

NC  =    l  :(CEX1) 

MATCH    FAILED 

EQ-LR+.OJ :F(NXTIT) _  133 

R  =    I  :  (TRYFWD)  134 

MATCH  SUCEEDED 

WE    HAVE    FOUND    A    MATCH 

ON    ENTRANCE    TO    THIS    PORTION    OF    CODE    THROUGH 

££JLL  OR    CFX?     .     WF    HAVF    CWIAP    ANin    Mr    r  Ai  fill  ATPn. 


120 

121 
122 
123 
124 
125 
126 
127 
128 


131 
132 


135 


WE  NOW  ALTER  CHAINS<CK>  AS  APPROPRIATE  FOR  ONE  OF 

6  CASES  OF  SUCCESS  AND  THEN  WE  GO  TO  SC . 

FQ«C.2I  :S(CM5) 

CHAINS<CK>    STFRG^  RPOS(O)       =  l36 

137 

m. 

139 
14Q 


TSG1  =  :(CM4) 

CHAINS  COO:    =    CHAIN <;<TK>     «/«  PViAP ;  (SCI  _ 

IDFNT(OVLAP)  :F(CM6)~ 

E0(r-»2)  :SCSC) 

CHAINS<CK>  STFRG  RPOS(O)  =  [^ 

142 

143 


CHAINS<CK>   =  CHAINS<CK>  STCHR  MSC) 

EQ<C,2)  :S(CM7) 

CHAJNSXCJ1>  .STFRG  RPflStn)  =_ :(f.Mft> ,u 

CHAINS<CK>  STCHN  PPOS(O)  = 
STCHN  TCHN  RPOS(O) 

STCHN   •-•   RPOS(O)   = 


145 
146 
147 
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1< 


* 


Cy2  TSGl       BREAK<»-«>        '-'        " If, 

STrHN  tsgi   RpnsiOl — h ,F(CM4)  i«. 

CM3  SF      LEN(l)     .    CH  =  JLCM31  1,! 

CHMNS<CK>    ^    CHAINS<CK>    STCHN    TSGl V    OVLAP  U 

SC  STK<M,1>      =    C  :S(N0T1)  If 

NQT1                USEDCHN<C, IT>       =       1  13 

1T                       =    FQ<R»0>    -IT  1= 

STK<M,?>       =     IT 

MUST   CLEAR    OUT    TOP    OF    STACK    POSITION   C    SINCE   TH!S    is   used 

IN    BACKUP.  1 

STK<M,1>  =    0  ^ L 

C  ;^C -.F(MIDCHN)  l 

P^TURnTo    TRY    FOR    A    NEW    ADDITION   TX    COPY  -  C 
FLSE    A    CHaIn    IS    COMPLETE    SINCE    OVLAP    IS    EMPTY 
*  SEE     IF    WHOLE    NEW    COPY    IS    COMPLETE 

P9T8  K  =     l  .ctPlflTl 

PQT9      E_01US£-DCHN.<Ll^JO-^OJ ruAiN 

r        NOT  COMPLETE,  RETURN  TO  TRY  FOR  A  NEW  CHAIN^^ 

plon     ?         I  CLT,;,NENT<1>.  K  ♦  1  =S(P9T9. 

:         \?£e%fl™M_K^™V»^ 

NEWCAND    =  NEWCAND  ♦  1 
IT  =  UB  ♦  NEWCANO 

*  FRROR  EXIT  TO  CHNOV  IF  CHAIN  STORAGE- XS_ USED  UP- 

ffife^"2££&3"&^ 

*  BFLCW    CHAIN<I>    CAN    BE    USED    NOW.  LCHNOV ) 
GT(IT,MAXCAN) 

STOReT.E    CHAINs'oF    THE    NEW    COPY    AT    CHAIN<UB    ♦    NEWCANO.    > 
ANO    PUT         THEN    IN    TSG!    FOR    LATER    PRINTOUT. 

TSGl  J 

K  =    1 

&ANCHOR    =    0 
PICT?  CHAIN<IT,K>       =    CHAINb<K>  :SIP10T3) 

PICT3  CHAIN<IT,K>    •/•        =  :S(P10T<»> 


]i 


P10T4  CHAIN<IT,K> 

CHAINCLUJa PnSI 


Ten    -     TSGl     '     *     '    CHAIN<IT,K> 

K  =    LT(K,CK1    K    ♦    1  ,M^ 


* 
* 
* 
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*  IF    HE    HAVE    A-B-C       HE    DELETE    A-B    REVERSE-C    REVERSE. 

*  BUT  WE  DO  NOT  DETECT  C-B-A  AS  DELETABLE. 

j*     __ __ 

*  DON'T    CHECK    IT    AGAINST    ITSELF 

EQINEWCAND,  1!                                                                            :S(P10T15)  182 

TV                       =    UB    ♦    1  183 

PIOT10         EQINCHN<TV>,CK)                           .                  ;F(P10T14)  184 

K                          =     1  185 

P10I11         XSG-3 =    CHAIN<TV.K> 186 

TSG5                  =    CHAIN<IT,K>  187 

EQISIZE1TSG3) ,SIZE(TSG51J                                         :F(P10T14)  188 

IDENT(TSG3,TSG5)                                                                 :S(P10T13)  189 

*  CHECK    THE    REVERSE    CHAIN.       

OUT                     =    0  190 

P1QT16  — TSG3    BRFAKf'-M     .     SF 1^1 = ;S(P1QT1U _..    191 

SF                       =    TSG3  192 

OUT                    =1  193 

P10T17         TSG5    LEN(l)     .    CH    RPOS(O)    =  194 

SF      CH        =                                                                             :F(P10T14)  195 

IDENT(SF)                                                                                    :F(P10T17)  196 

EQLQUI*L1 ;S(P1QT13) ..   _     197 

TSG5       ■-■       RPOS(O)       =                                                   :S(P10T16)F(P10T14)  198 

P10T13         K                         =    LTIK.CK)       K    ♦    1                          ._         ;S(P10T111  199 

*  DELETE    THE    NEW    CANDICATE 

OUTPUT    =     •       NEW   CAND.    NOT    SAVED    IS    ■     TSG1     •       MATCHES    #•    TV  200 

NEWCAND         =    NEWCAND   -    1                                                MSTKP1)  201 

P10T14         IV   -  =    t-T(TVfHR    ♦    NFyCAND    -    1  >    TV    ♦    1 LS1E10T.1QJ 202 

P10T15         OUTPUT    =    TSGi    •     IS    NEW    CANDIDATE    COPY    IN    CHAIN<«  203 

IT       • ,    >•  203 

STKP1                 FQINOTSTKP, 1)                                                                            :S(BACKUP)  204 

TV    =    1  2  05 

TSG5    =  206 

SLBL1            TSG5    =    ISG5  .STK<TVtl^_;     »    STK<TVf2>    '     t    ' _. .  2Q7 

TV    =    LT(TV,M)    TV    ♦    1                                                      :S(SLBL1)  208 

OUTPUT       =       •    STK    CONTENTS:     •       TSG5    »          I    =    •     I  209 

*  NOW    BACKUP    TO    GET    OTHER    ALTERNATIVES 
* 

*  BACKUP    PART:    POP    INDICATOR    OF    OF    LAST    ADDED    CHAIN    TO 

*  TCP    OF    STACK 


BACKUP 

M                            =    M    -     1 

C                           =     STK<M,1> 

IT                        =    STK<M,2> 

210 
211 
212 

LT( IT, 0) 

:F(P11T3) 

213 

R                          =     1 

214 

P11T3 

|T                          =    -IT 

: (P11TA1J 

215 

R                           =0 

216 

PilTAl 

USEDCHN<C, IT>       =    0 

217 

* 

DFLETE     LAST    /OVLAP    FROM 

CHAINS<CK>.     ANCHOR    =    0 

TSGI                  =    CHA1MS<CK> 

218 

IDENT(OVLAP) 

:S(P11TA2) 

219 

f.ANfMriR    =    n 

220 

TSGi     '/•     OVLAP    RPOS(O)    = 

:F(ERRP) 

221 

P11TA2 

IDENTI TSGI) 

:S(P11T4) 

222 

TSGI    =    EQ(STK<M    +    1,1>,C) 

TSGI 

OVLAP 

223 

EC(NCTPRT2,1) 

;SINQT2) 

224 

TV    =     1 

225 

i        .      — 

TSG5    = 

226 

SLRL               TSG5    =    TSG5    STK<TV,1>    •     •     STK<TV,2>  •     ,     •                                                                 227 

TV    =    LT(TV,M)    TV    ♦    1  :S(SLBL)                                            228 

OUTPUT       =       •     STK    CONTENTS:     •          TSG5  229 
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-.,*.*.    ,c    pmpty    TRANSFER-  TO   BACKUP    TO   EARLLER   CHAI N 

J  CTHERwIsVgET "^^^CHAIN    FOR    PREVIOUS    MATCH         ^ 

ml  2  CHAlli££CKi h I-SGJ — 

PUTA  &ANCHOR    =    I  _  :SIP11T21 

P11T2  TSGl       BREAKC/M        •/'       - 

OVLAP  =    TSGl  :SINXTIT)FITRYFWD) 

nniSu?1     -       •    ERROR    IN    MATCH    AT    PUT1     ■        MENOI 

put,      i?  =1LIUTAfNT<l>)  IT  *  if  5mi 

EQ(USEDCHN<1.IT>,01  MNEWCHN) 

Otherwise  back  up  to  an  earlier  chain  if"^. 

P11T5  CK  =    MEICKtll    CK  ..RAfXUP> 

OVLAP—     -  -5- "  ' 

••      saaa  &:sh™=^— 

:      st  sjseSo  ■•««  »«'"»»■  '"""'is,,..., 

FQ(NEWCAND,0) 

lb  =  ..ua.±_i 

,,R  =    UB    +    NEWCAND 

OUTPUT      =  '  END  OF  CANDIDATES  FOR  J  -    J 

Increment  /if   possible,  else  rerun  if  any  more  runs 

j  =    LT(J,NCOPS)    J    ♦    1  *M 

■JSSis"  s"=  cnSRUNsTins-iisi "^7",'" ~7?<«r,F,TNln 

FRRH  "output    -    •     ERROR:    NO   NU    CANDIDATES    •  «0 

STKnv  OUTPUT    .    ■    ERROR:    S    K  t  t 

SUSS""      ^OUTPUT"-    •    S'exJfEDEO    AT    •    N«C»N    _"«««>» 

ENC 
LpRCRS    CETECTED    DURING    COKPILATICN 
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GET 
MRUNS 


INITIALIZE 
NC0PS,  MAXFRG, 
ITRKE,  NFRG, 
FOR  NEW  INPUT. 

&ANCHOR  =  1 


READ  INPUT 

COPIES  INTO 

ITREE. 

PRINT  ITREE. 


DEFINE  DATA 

ARRAYS  AND 

STACK. 


SET  CANDIDATE  COPY  1  =  INPUT  COPY 

LB  =  UB  =  1 

NCHN<I>  =  NFRG<1> 

CHAIN<1,K>   =   ITREE<1,K> 

FOR   K  =  1,2,    ...,NCHN<1> 


INITIALIZE  FOR  NEW  J; 

NENT<2>  =  NFRG<J> 

FOR  K  =  1,2,  ...,NENT<2> 

USEDCHN<2,K>  =  0 

C0PY<2,K>  =  ITREE<J,K> 


F2 


INITIALIZE  FOR  NEW  I: 

NENT<1>  =  NCHN<I> 

FOR  K  =  1,2,  ...,  NENT<1> 

USEDCHN<1,K>  =  0 

COPY<l,K>  =  CHAIN<I,K> 


INITIALIZE 

STACK  AND  CHAINS 

M  =  1 

CK  =  1 

K  =  1 


I_* 


USEDCHN<1 ,K>  =  1 
PUSH  STACK 
CLEAR  T0P  OF  STACK 


C   =   2 

0VLAP  =   C0PY<1,K> 

CHAINS<CK>   = 

'/'    0VLAP 


I  MIDCHU  V- 
(  TRYFWD   J 


IT  =  0 
R  =  0 


NXTIT 


YES 


IT  =  IT  +  1 


YES 


YES 


YES 


TCHN  = 
COPY<l,IT> 


R  =  1 

TCHN  =  0VLAP 

TFRG  =  COPY<2,IT> 


&ANCHOR  =  0 

TCHN  = 

REVERSE ( C0PY<1 , IT> ) 


TFRG  = 
0VLAP 


YES 


R  =  1 
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fcANCHOR  -  0 
DIFF  -  0 

0UT  ■  0 


DEISTS  LEFTMOST 

FRAGMENT  FROM 

TCHN.  PUT  IT 

IN  SF. 


TCHN 

EMPTY 


YES 


NO 


0UT  -  1 


YES 


YES 


•MATCH  SUCCEEDS 

TFRG  IS  LONGER. 

gfVLAP  =  TFRG 

NC  =   1 


YES 


DELETE  STFRG 

ADD 

STCHN    '/'    0VLAP 

TO  CHAINS<CK> 

•CASE  6 


YES 


.•MATCH  FAILS N>J£S_ 
R  »   0 

"Tno 


(   NXTIT    j 


ADD    '/'    tfVLAP 

TO 

CHAINS<CK> 

•CASE  3 


REPLACE  STFRG 

WITH  STCHN 

IN  CHAINS<CK> 

•CASE  5 


R  -  1 


(  TRYFTO  ) 


•MATCH  SUCCEEDS 
TCHN  IS  LONGER  OR  - 
SF  -  DIFF  SF 


0VLAP  « 
SF  *-'  TCHN 


NC  -  2 


YES 


YES 


CH  - 
RIGHTMOST 
CHARACTER  IN  SF 
DELETE  IT. 


DIFF  * 

DIFF  CH 


YES 


DELETE  STCH* 
FROM 
CHAINS<CK> 
•CASE  1 


DELETE  STFRG 

FROM 

CHAINS<CK> 

•CASE  U 


PUT  ALL  COMPLETE 

MATCHING  FRAGMENTS 

FROM  LEFT  END 

INTO  STCHN 


ADD 
STCHN  TSG1 
'/'  0VLAP 
TO  CHAINS<C10 


TSG1  - 
PART  OF  LAST 

FRAGMENT 
WHICH  MATCHED 
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TV  - 

i 

i 

YES   . 

UB+NEWCAND  TI 


NO  .  CHAIN  <TV ,  K>\  YES 
s^  REVERSE  OF 
CHAIN<IT.K> 


\ 

NO 

DELETE 
CANDIDATE  COPY 
NEWCAND  = 
NEWCAND  -1 

PRINT  OUT 

DELETED 
CANDIDATE 

PRINT  OUT 
NEW  CANDIDATE 
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•POP  STACK 

M  -  M-l 

C  -  STK<M,1> 

IT  -  STK<M,2> 


R  =  1 
IT  =  -IT 


YES 


J.ANCHOR  =  0 

DELETE 

•/'  0VLAP 

FROM  END  OF 

CHAINS  <CK> 


(  NXTIT  ] 


YES 


CHAINS  <CK>  ■ 
CHAINS <CK>0VLAP 


YES 


(  newchnV 


K  -  IT 


PRINT  OUT  STK 

IF 

N0TPRT2  4   1 


L 


fcANCHOR  *  1 

0VLAP  -  STRING 

IN  CHAIN  <CK> 

AFTER  '/' 


•GET 

PREVIOUS  CHAIN 

CK  -  CK  -  1 

0VLAP  -  <t 


VcK^   1 


NO 
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. 


(    STK^V   y. 


LB  =  UB+1 
UB  =  Ub+NEWCAND 


•   ■   IEXT 
CANDI, 

TO 'COMPARE  WITH 
INPUT  COPY  J. 

I  =  1  +  1 


0UTPUT 

'ERR0R  =  N0 

NEW  CANDIDATES' 

NO  CONSISTENT 

COPIES  GENERATED 


•ALL  NEW  CANDIDATES  FOR  THIS  J  ARE 

IN  CHAIN  <K,IT>  FOR 

K  =  LB,  LB+1,  ....  UB 

IT  =  1,2,  ....  NCHN<K>. 

OUTPUT  =  'END  OF  CANDIDATES  FOR  J  =  'J 


•GET  NEXT 
INPUT  COPY. 

J   =  J+l 


YES 


•NEW  DATA  RUN 
NRUNS   =  NRUNS-1 


yJUTPUT  = 
'ERR0R:    STK 
IS   FULL' 


(  RERUN  \ 


OUTPUT  = 
•MAX CAN 
EXCEEDED  AT 
MAXCAN 


(  DATAERR  L 


OUTPUT  = 

'ERR0R  IN 

INPUT  DATA' 


(    END   J 
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